642 research outputs found
Improved sparse approximation over quasi-incoherent dictionaries
This paper discusses a new greedy algorithm for solving the sparse approximation problem over quasi-incoherent dictionaries. These dictionaries consist of waveforms that are uncorrelated "on average," and they provide a natural generalization of incoherent dictionaries. The algorithm provides strong guarantees on the quality of the approximations it produces, unlike most other methods for sparse approximation. Moreover, very efficient implementations are possible via approximate nearest-neighbor data structure
One Table to Count Them All: Parallel Frequency Estimation on Single-Board Computers
Sketches are probabilistic data structures that can provide approximate
results within mathematically proven error bounds while using orders of
magnitude less memory than traditional approaches. They are tailored for
streaming data analysis on architectures even with limited memory such as
single-board computers that are widely exploited for IoT and edge computing.
Since these devices offer multiple cores, with efficient parallel sketching
schemes, they are able to manage high volumes of data streams. However, since
their caches are relatively small, a careful parallelization is required. In
this work, we focus on the frequency estimation problem and evaluate the
performance of a high-end server, a 4-core Raspberry Pi and an 8-core Odroid.
As a sketch, we employed the widely used Count-Min Sketch. To hash the stream
in parallel and in a cache-friendly way, we applied a novel tabulation approach
and rearranged the auxiliary tables into a single one. To parallelize the
process with performance, we modified the workflow and applied a form of
buffering between hash computations and sketch updates. Today, many
single-board computers have heterogeneous processors in which slow and fast
cores are equipped together. To utilize all these cores to their full
potential, we proposed a dynamic load-balancing mechanism which significantly
increased the performance of frequency estimation.Comment: 12 pages, 4 figures, 3 algorithms, 1 table, submitted to EuroPar'1
Systematic mapping review on student’s performance analysis using big data predictive model
This paper classify the various existing predicting models that are used for monitoring andimproving students’ performance at schools and higher learning institutions. It analyses all theareas within the educational data mining methodology. Two databases were chosen for thisstudy and a systematic mapping study was performed. Due to the very infant stage of thisresearch area, only 114 articles published from 2012 till 2016 were identified. Within this, atotal of 59 articles were reviewed and classified. There is an increased interest and research inthe area of educational data mining, particularly in improving students’ performance withvarious predictive and prescriptive models. Most of the models are devised for pedagogicalimprovements ultimately. It is a huge scarcity in producing portable predictive models that fitsinto any educational environment. There is more research needed in the educational big data.Keywords: predictive analysis; student’s performance; big data; big data analytics; datamining; systematic mapping study
Spatially embedded random networks
Many real-world networks analyzed in modern network theory have a natural spatial element; e.g., the Internet, social networks, neural networks, etc. Yet, aside from a comparatively small number of somewhat specialized and domain-specific studies, the spatial element is mostly ignored and, in particular, its relation to network structure disregarded. In this paper we introduce a model framework to analyze the mediation of network structure by spatial embedding; specifically, we model connectivity as dependent on the distance between network nodes. Our spatially embedded random networks construction is not primarily intended as an accurate model of any specific class of real-world networks, but rather to gain intuition for the effects of spatial embedding on network structure; nevertheless we are able to demonstrate, in a quite general setting, some constraints of spatial embedding on connectivity such as the effects of spatial symmetry, conditions for scale free degree distributions and the existence of small-world spatial networks. We also derive some standard structural statistics for spatially embedded networks and illustrate the application of our model framework with concrete examples
Stochastic Budget Optimization in Internet Advertising
Internet advertising is a sophisticated game in which the many advertisers
"play" to optimize their return on investment. There are many "targets" for the
advertisements, and each "target" has a collection of games with a potentially
different set of players involved. In this paper, we study the problem of how
advertisers allocate their budget across these "targets". In particular, we
focus on formulating their best response strategy as an optimization problem.
Advertisers have a set of keywords ("targets") and some stochastic information
about the future, namely a probability distribution over scenarios of cost vs
click combinations. This summarizes the potential states of the world assuming
that the strategies of other players are fixed. Then, the best response can be
abstracted as stochastic budget optimization problems to figure out how to
spread a given budget across these keywords to maximize the expected number of
clicks.
We present the first known non-trivial poly-logarithmic approximation for
these problems as well as the first known hardness results of getting better
than logarithmic approximation ratios in the various parameters involved. We
also identify several special cases of these problems of practical interest,
such as with fixed number of scenarios or with polynomial-sized parameters
related to cost, which are solvable either in polynomial time or with improved
approximation ratios. Stochastic budget optimization with scenarios has
sophisticated technical structure. Our approximation and hardness results come
from relating these problems to a special type of (0/1, bipartite) quadratic
programs inherent in them. Our research answers some open problems raised by
the authors in (Stochastic Models for Budget Optimization in Search-Based
Advertising, Algorithmica, 58 (4), 1022-1044, 2010).Comment: FINAL versio
Learning Best Response Strategies for Agents in Ad Exchanges
Ad exchanges are widely used in platforms for online display advertising.
Autonomous agents operating in these exchanges must learn policies for
interacting profitably with a diverse, continually changing, but unknown
market. We consider this problem from the perspective of a publisher,
strategically interacting with an advertiser through a posted price mechanism.
The learning problem for this agent is made difficult by the fact that
information is censored, i.e., the publisher knows if an impression is sold but
no other quantitative information. We address this problem using the
Harsanyi-Bellman Ad Hoc Coordination (HBA) algorithm, which conceptualises this
interaction in terms of a Stochastic Bayesian Game and arrives at optimal
actions by best responding with respect to probabilistic beliefs maintained
over a candidate set of opponent behaviour profiles. We adapt and apply HBA to
the censored information setting of ad exchanges. Also, addressing the case of
stochastic opponents, we devise a strategy based on a Kaplan-Meier estimator
for opponent modelling. We evaluate the proposed method using simulations
wherein we show that HBA-KM achieves substantially better competitive ratio and
lower variance of return than baselines, including a Q-learning agent and a
UCB-based online learning agent, and comparable to the offline optimal
algorithm
The Tree Inclusion Problem: In Linear Space and Faster
Given two rooted, ordered, and labeled trees and the tree inclusion
problem is to determine if can be obtained from by deleting nodes in
. This problem has recently been recognized as an important query primitive
in XML databases. Kilpel\"ainen and Mannila [\emph{SIAM J. Comput. 1995}]
presented the first polynomial time algorithm using quadratic time and space.
Since then several improved results have been obtained for special cases when
and have a small number of leaves or small depth. However, in the worst
case these algorithms still use quadratic time and space. Let , , and
denote the number of nodes, the number of leaves, and the %maximum depth
of a tree . In this paper we show that the tree inclusion
problem can be solved in space and time: O(\min(l_Pn_T, l_Pl_T\log
\log n_T + n_T, \frac{n_Pn_T}{\log n_T} + n_{T}\log n_{T})). This improves or
matches the best known time complexities while using only linear space instead
of quadratic. This is particularly important in practical applications, such as
XML databases, where the space is likely to be a bottleneck.Comment: Minor updates from last tim
Managing Risk of Bidding in Display Advertising
In this paper, we deal with the uncertainty of bidding for display
advertising. Similar to the financial market trading, real-time bidding (RTB)
based display advertising employs an auction mechanism to automate the
impression level media buying; and running a campaign is no different than an
investment of acquiring new customers in return for obtaining additional
converted sales. Thus, how to optimally bid on an ad impression to drive the
profit and return-on-investment becomes essential. However, the large
randomness of the user behaviors and the cost uncertainty caused by the auction
competition may result in a significant risk from the campaign performance
estimation. In this paper, we explicitly model the uncertainty of user
click-through rate estimation and auction competition to capture the risk. We
borrow an idea from finance and derive the value at risk for each ad display
opportunity. Our formulation results in two risk-aware bidding strategies that
penalize risky ad impressions and focus more on the ones with higher expected
return and lower risk. The empirical study on real-world data demonstrates the
effectiveness of our proposed risk-aware bidding strategies: yielding profit
gains of 15.4% in offline experiments and up to 17.5% in an online A/B test on
a commercial RTB platform over the widely applied bidding strategies
On Exchange of Orbital Angular Momentum Between Twisted Photons and Atomic Electrons
We obtain an expression for the matrix element for a twisted
(Laguerre-Gaussian profile) photon scattering from a hydrogen atom. We consider
photons incoming with an orbital angular momentum (OAM) of ,
carried by a factor of not present in a plane-wave or pure
Gaussian profile beam. The nature of the transfer of units of OAM from
the photon to the azimuthal atomic quantum number of the atom is investigated.
We obtain simple formulae for these OAM flip transitions for elastic forward
scattering of twisted photons when the photon wavelength is large
compared with the atomic target size , and small compared the Rayleigh range
, which characterizes the collimation length of the twisted photon beam.Comment: 16 page
Cross-Document Pattern Matching
We study a new variant of the string matching problem called cross-document
string matching, which is the problem of indexing a collection of documents to
support an efficient search for a pattern in a selected document, where the
pattern itself is a substring of another document. Several variants of this
problem are considered, and efficient linear-space solutions are proposed with
query time bounds that either do not depend at all on the pattern size or
depend on it in a very limited way (doubly logarithmic). As a side result, we
propose an improved solution to the weighted level ancestor problem
- …